Hierarchical subspace models for contingency tables 
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Abstract 

For statistical analysis of multiway contingency tables we propose modeling in- 
teraction terms in each maximal compact component of a hierarchical model. By 
this approach we can search for parsimonious models with smaller degrees of free- 
dom than the usual hierarchical model, while preserving conditional independence 
structures in the hierarchical model. We discuss estimation and exacts tests of the 
proposed model and illustrate the advantage of the proposed modeling with some 
data sets. 

Keywords : context specific interaction model, divider, Markov bases, split model, uniform 
association model. 



1 Introduction 

Modeling of the interaction term is an important topic for two-way contingency tables, 
because there is a large gap between the independence model and the saturated model. 
This problem is clearly of importance for contingency tables with three or more factors. 
However modeling strategies of higher order interaction terms have not been fully dis- 
cussed in literature. In this paper we propose modeling interaction terms of multiway 
contingency tables by considering each maximal compact component of a hierarchical 
model. 

For two-way contingency table s the unifo r m ass o ciatio n model ( Goodman 1979 . 1985| ) 
and the RC association model flGoodmanl |l979l . Il985t . iKurikil i2005j T" are often used 
for modeling interaction terms. In the analysis of agreement among raters, where data 
are summarized as square contingency tables with the same categories, many models 
with interaction in diagonal elements and t heir extens i on to m ultiway tables have been 
considered (e.g. iTanner and Youngj [l985|, iTomizawal |2009| ). iHirotsul |l997l | proposed 
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a two-way change point model and iHara et al.l 2009 1 gen eralized it to a subtable sum 



model. For multiway contingency tables lH0jsgaardl |2003l | considered the split model as 
a generalization of graphical models. The context specific interaction model defined by 



H0jsgaardl |2004l ] is a more general model than the split model. 



We give a unified treatment of these models as submodels of hierarchical models. 



Malvestuto and Moscarinil [2000l | showed that a hierarchical model possesses a compaction, 



i.e. the variables are grouped into maximal compact components separated by dividers. 
Variables of different maximal compact components separated by a divider are condition- 
ally independent given the variables of the divider. Furthermore the likelihood function 
factors as a rational function of marginal likelihoods, where the numerator corresponds to 
maximal compact components and the denominator corresponds to dividers. By this fac- 
torization, statistical inference on a hierarchical model can be localized to each maximal 
compact component. In the case of decomposable model, maximal compact components 
and dividers reduce to maximal cliques and minimal vertex separators of a chor dal graph, 
respec tively, and the above factorization is well known (e.g. Section 4.4 of iLauritzen 
1996l ]l 

In a usual hierarchical model each maximal interaction effect is saturated, i.e. there 
is no restriction on the parameters for maximal interaction effects. However, as in the 
two-way tables, we can consider submodels for interaction effects. In the modeling pro- 
cess, it is advantageous to treat each maximal compact component of a hierarchical model 
separately and to keep the compaction of the hierarchical model. By respecting the com- 
paction of the hierarchical model, conditional independence property and the localization 
property of the hierarchical model are preserved. We call a resulting model a hierarchi- 
cal subspace model. We prove some properties of our proposed model and illustrate its 
advantage with some data sets. 

The organization of the paper is as follows. For the rest of this section, as a motivating 
example, we consider a submodel of the conditional independence model for three-way 
contingency tables. In Section[2]we define the hierarchical subspace model and discuss the 
localization of inference through the decomposition of the model into maximal extended 
compact components. We also discuss maximum likelihood estimation of the proposed 
model. In Section [3] we study the split model in the framework of this paper. In Section 
m we present construction of Markov bases for conditional tests of the model. Fitting of 
the proposed model to several real data sets is presented in Section [51 Some concluding 
remarks are given in Section [61 



1.1 A motivating example: subspace conditional independence 
model for three-way tables 

As an illustration of hierarchical subspace models we discuss a submodel of conditional 
independence model for three-way tables. Denote the sample size by n = Consider 
a.n I X J X K contingency table and let pijk denote the probability of the cell. Marginal 
probabilities are denoted by pi++,pij+, etc. Similar notation is used for the frequencies 
X = {xijk} of the contingency table. Consider the conditional independence model i _LL 
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k \ j, which is expressed by 

logPijfc = ttij + bjk- (1) 

In the usual conditional independence model, ai/s and bj^s are free parameters. Now 
suppose that we have known functions (ptj depending on i and j and ipjk depending on 
j and k. Separating main effects, consider the following submodel of the conditional 
independence model 

logPijfc = ttj + pj + 7fc + 5(pij + S'^Jjjk. (2) 

The parameters of this model are {ai}j^i, {j3j}j^i, {7fc}|Li and 6, 6'. The uniform associ- 
ation model is specified by = ij. The two-way change point model in iHirotsul [1997 
is specified by 

1, if i < Ji and j < Ji, 
0, otherwise 



where 1 < Ii < I, 1 < Ji < J. Similarly we can specify ipjk according to many well 
known models. 

As a submodel of the conditional independence model, i AL k \ j holds for ([2]) and we 
can write 

_ Pij+P+jk 

Pijk 

P+j+ 

Moreover, since {(3j}j^^ are free parameters, the model is saturated for the main effect 
of the second factor. This implies that the maximum likelihood estimate (MLE) of the 
model is obtained as 

Pij+P+jk , . 

Pijk - !—■ [O) 

Here Pij+ is the MLE of the following model for the marginal probability 

logpij+ = ai + Pj + dcpij (4) 

and Pij+ only depends on the marginal frequencies {xij+}. Similarly p+j^ is the MLE for 
(j, /c)-marginal probabilities: 

\ogp+jk = Pj +'jk + S'^jk- (5) 

In this way the maximum likelihood estimation of the model ([2]) is localized to estimations 
of two marginal models. 

In our model ([2]) it is important to note that 6 and 6' are free parameters. Consider 
an additional constraint H : 6 = 6' to l^: 

logpijk = ai + Pj + 7fc + 6{(f)ij + i)jk). (6) 

This model is still log-afiine and contained in the conditional independence model. How- 
ever, because both the (z, j)-marginals and the (j, fc)-marginals are relevant for the esti- 
mation of the common value of 5, we can not localize estimation of the parameters to 
two marginal tables. This consideration shows that it is convenient to set up a log-affine 
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model, which is "conformal" to the conditional independence model. We will appropri- 
ately define the notion of conformality in Section [2l 

In our model we can also allow certain patterns of s tructural zero s . Con sider the data 
on song sequence of a wood pewee in Section 7.5.2 of iBishop et al.l 1975l |. The data is 
shown in Table [TJ The wood pewee has a repertoire of four distinctive phrases. The 
observed data consists of 198 triplets of consecutive phrases k) e {1, 2, 3, 4}^. It is a 
4x4x4 contingency table with the cells of the form (i, i, k) and 
zeros. 



being structural 



Table 1: Triples of phrases in a song sequence of a wood pewee, with repeats deleted. 



Third place 
First place Second place A B C D 
A A — — ~ — 



B 19 — 2 2 

C 2 26 — 

D 12 5 — 

B A — 9 6 12 

B _ _ _ _ 

C 24 1 ~ 1 

D 12 — 

C A — 4 22 

B 3 — 22 

C _ _ _ _ 

D 10 — 

D A — 11 4 

B 5 — 11 

C — 

D _ _ _ _ 



Source: Craig [1943 



The model considered in iBishop et al.l 1975l | is written as 

Pijk = l{i^j}e''^n{j^k}e^^\ (7) 

where aij and hj^ are free parameters. With some abuse of notation ([7j) can be written as 

logPijfc = aijl{i^j} + (-cx))l{,=j} + hjkl{j^k} + (-oo)l{j=fc}. 

As (E]), this model is a conditional independence model and also it is saturated for the 
main effect of the second factor. Therefore the MLE for this model is again expressed as 
([3]). An appropriate handling of (— cxo) and further analysis are given in Section [5l 



In Section [HI we also consider the split model defined by iHojsgaardI j2003 



as an 



important example of the hierarchical subspace model. Here we give an example of the 
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split model for the three-way table {xijk} by 



logPijk = a-i + bj + Ck + {dij + ejk)l{jeji}, 

where J^i is a subset of {1, . . . , J}. This model means that the interaction between i and 
j (resp. j and k) exists only if j E Ji. The general definition of split models is given in 
Section [3] and a numerical example of it is given in Section [51 

We now conside r the conditional goodness of fi t test of the model based on the Markov 
basis methodology (jPiaconis and Sturmfelsl 19981 ] ). Assume that (pijS and ipjkS in ([2]) are 
integers. Furthermore suppose that Markov bases E12 and are alr eady obtained for two 
margi nal models (j4]) and ([5]). Following the notation in Section 2 of iDobra and SuUivant 



200J], write a particular move z = {z{i,j)} of degree d in the Markov basis B12 for 



as 

z = [{iH,J,),...,{^,,m\{i^[,J[),...,{^',,m, (8) 

where (ii, ji), . . . , {id,jd) are cells (with replication) of positive elements of z and 
. . . , {i'd,jd) are cells of negative elements of z. By replication we mean that the same cell 
is repeated j)| times in ([H]). Extend the move z to three-way table as 

= [{(zi, ji, ki), . . . , (zrf, jd, kd)}\\{{t[,j[, ki), . . . , {i'^,]'^, kd)}], 

where /ci, . . . , fc^ G {1, . . . , K} are arbitrary. Similarly for a move 

z = [{(ji, kr), . . . , (j,, k,)}\\{{j[, k[), . . . , (j^, A;;,)}] G i323 

let 

= [{(2^, ki), . . . , (id, jd, /i;d)}||{(ii, J-;, /ci), . . . , {id.fd, k'a)}]. 

Let i3{i,2},{2,3} d enote a Markov basis for co nditional independence model ([1]). Following 
the argument in iDobra and SuUivantl 2004| we easily see that 

B = S{i,2},{2,3} U z G Bi2, 1 < A;i, . . . , A^deg. < K} 



is a Markov basis for ([2]). Therefore the problem of conditional test of the model 
also localized to two marginal models. 

For the rest of this paper, we generalize the above results to a log-affine model. 



IS 



2 Hierarchical subspace models and their decompo- 
sitions 

In this section we give a definition of hierarchical subspace models for Ji x ■ ■ ■ x 
contingency tables and discuss its decomposition by partial edge separators. In Section 
I2.1l we define a hierarchical subspace model. In Section [2^2] we prove that for any log-affine 
model there exists a natural smallest decomposable model, such that the the log-affine 
model is a hierarchical subspace model of the decomposable model. In Section 12.31 we 
discuss properties of hierarchical models containing a given log-affine model. 



5 



2.1 Definition of a hierarchical subspace model 

We give a definition of a liierarcliical subspace model and also discuss the loca lization of 



maxim um likelihood estimation of th e prop osed model. We follow notation of iLauritzen 
1996 ] and Malvestuto and Moscarini 2000l . 



Let V = M^ix-x-f™ denote the set of Ii X ■ ■ ■ X Im tables with real entries, where 
Ij > 2 for all j. V is considered as an Ji x ■ ■ ■ x Jm-dimensional real vector space of 
functions (tables) from X = [Ii] x ■ ■ ■ x [Im] to M, where [J] denotes {1, . . . , J}. A 
probability distribution over X is denoted by {p{i),i G X}. Let L be a linear subspace 
of V containing the constant function 1 G L. A log-affine model specified by L is given 
by logp(-) G L, where logp(-) denotes {logp(i), i G X}. In the following we only consider 
linear subspaces of V containing the constant function 1. 

Let D be a subset of [m] = {1, . . . ,m}. in = {ij,j G D} is a D-marginal cell, piin) 
denotes the marginal probability of a probability distribution p{-). Similar ly x (in) denotes 
the marginal frequency of the contingency table x = {x{i),i G T}. As in Sei et all |2008l 
let 

Ld = {i^ e V \ ...,im)= ... if ih = i'h,^h G D} 

denote the set of functions depending only on i/j. L/j is considered as M^^, where lo = 
YlheD^h- Hence we have L[m] = V. For a subspace L and D C [m], we say that D is 
saturated in L if L^, G L. D is saturated in L if and only if the sufficient statistic for L 
fixes all the D-marginals of the contingency table. Note that if D is saturated in L, then 
every E G D is saturated in L. 

Let A denote a simplicial complex on [m] and let red A denote the set of maximal 
elements, i.e. facets, of A. Then the hierarchical model associated with A is defined as 



logp(-) G La = Yl 



Dered A 



where the right-hand side is the summation of vector spaces. We note that red A is 
co nsidered as a hypergraph. Here w e summarize some notions on hypergraphs according 



Malvestuto and Moscarini 200d |. A subset of a hyperedge of red A is called a partial 



to 

edge. A partial edge S" is a separator of red A if the subhypergraph of red A induced by 
[m] \ 5 is disconnected. A partial edge separator S of red A is called a divider if there 
exist two vertices u,v G [m] that are separated by S but by no proper subset of S. If 
two vertices u, v G [m] are not separated by any partial edge, u and v are called tightly 
connected. A subset C G [m] is called a compact component if any two variables in C are 
tightly connected. Denote the set of maximal compact components of red A by C. Then 
there exists a sequence of maximal compact components Ci, . . . , C\c\ such that 

(Ci u ■ ■ ■ u Ck-i) nCk = Sk 

and Sk, k = 2, . . . ,\C\ are dividers of red A. We denote S = {S2, . . . , >S'|c|}. iS is a multiset 
in general. 
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Let Wi,. . . , Wk be linear subspaces of V and let W = Wi + ■ ■ ■ + Wk- We say that 

K 

L = (LnWi) + --- + {Lr]WK). 



a subspace L is conformal to {Wj}^^^ if 



Any L conformal to is clearly a subspace of W. Note that the relation L = 

L nW D (L n Wi) + ■ ■ ■ + (L n Wk) always holds but the inclusion is strict in general. 
Consider the models ([2]) and in Section II. 1[ Let K = 2 and let Wi := i^{i,2} and 
W2 := -^{2,3}- In the case of the model ([2]), 

Lr]Wi = {ai + (3j + 6(j),j}, LnW2 = + 7fc + <^>jfc}, 

where Oj, 7^, 5, 5' G M are free parameters. Hence L = (L fl Wi) + (L n 1^2) and ([2]) is 
conformal to two marginal spaces {L{i 2}, -^^{2,3}}- In the case of the model ([6[), however, 

L n w^i = {«, + L n 1^2 = + Ik]- 

Hence (L fl Wi) + (L fl 14^2) = {«i + Pj + 7fe} and the model ([6]) is not conformal to 

{-^^{l,2},-^^{2,3}}- 

Given a subspace L consider a hierarchical model La 3 L. We present the following 
definition of a hierarchical subspace model. 

Definition 1. L is a hierarchical subspace model (HSM) of L/\ if the following conditions 
hold: 

1. Lc La- 

2. Each divider S & S o/red A is saturated in L, i.e. Ls C L. 

3. L is conformal to the set of subspaces {L^ C G C}. 

Condition 2 guarantees that the MLE p{i) satisfies 

By Condition 3 the marginal probability p{ic) in the numerator of (I9j) coincides with 
the MLE of the model L fl Lc, which is computed only on the marginal table x{ic)- 
We confirm this fact. By induction, it is sufficient to consider the case C = {C, R} 
with S = C n R. The MLE of the model L is the maximizer of logp(i) subject 

to logp(-) G L and — ^- Condition 3 we write logp(-) = Oc + Oj^ with 

9c & L n Lc and ^/j G L fl Lr. Since Ls is saturated both in L fl Lc and L fl L^, we 
can assume ^^^^^ e^c(*c) = i for each is without loss of generality. Hence the problem 
is decomposed into two parts: maximization of x{ic)Oc{ic) subject to 9c & L n Lc 
and e^c(*c) = x, and maximization of x{iR)9R{iR) subject to 9r E LCi Lji and 

Sij, e^^'-''^^ = 1. Since the maximizer 9c does not depend on R, it is computed from the 
case R = S. We have 9c{i) = log{p{ic) / {x{is) / n)} , where p{ic) is the MLE of the model 
L n Lc- Hence the computation of the MLE of an HSM of La is localized to each C G C 
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2.2 Ambient decomposable model of a log-afRne model 

In Definition [H L is an HSM of a particular La- Note tliat by definition every log-affine 
model L is an HSM of the saturated model L\m]- Therefore every log-affine model L has 
a hierarchical model for which L is an HSM and a natural question is to look for a small 
simplicial complex A such that L is an HSM of La- As the main theoretical result of 
this paper we show in Theorem [1] below that for each log-affine model L there exists a 
natural smallest decomposable model L-yi such that L is an HSM of L-^. We call such L-j-^ 
the ambient decomposable model of L. 

We want to define the ambient decomposable model in such a way that the condi- 
tional independence model i _LL | j is the ambient decomposable model for ([2]) and the 
saturated model L[3] is the ambient decomposable model for ([6]). 

In order to define the ambient decomposable model, we first define connectedness of 
L and a partial edge separator of L. L is called disconnected if there exists a non-empty 
proper subset A of [m] such that L is conformal to {La^L^c}, where A*" denotes the 
complement of A. This means that the variables in A and the variables in A*" are inde- 
pendent and independently modeled in L. We call L connected if L is not disconnected. It 
is obvious that under this definition L can be decomposed into its connected components 
and each connected component can be investigated separately. Therefore from now on we 
assume that L is connected. 

Definition 2. A non-empty subset S of [m] is called a partial edge separator of L if [m] 
is partitioned into three non-empty and disjoint subsets {Ai,A2.,S} such that 

1. S is saturated in L. 

2. L is conformal to {L^ius, -^^205}- 

Then we call the triple (Ai, yl2, S) a decomposition of L. When the model L has a partial 
edge separator, we call L reducible. A pair of vertices i and j are called tightly connected 
in L if there does not exist a decomposition {Ai, A2, S) of L such that i ^ Ai and j G A2. 
When L is not reducible, we call L prime. 

A set of vertices such that any two of them are tightly connected in L is called extended 
compact component of L. The set of maximal extended compact components of L is 
considered as a hypergraph and is denoted by Ti. Denote by L-}^ the hierarchical model 
induced by Ti. Then we have the following theorem. 

Theorem 1. Lfi is the smallest decomposable model with respect to inclusion relation 
such that L is an HSM of L-j-i . 

The following corollary is obvious from ([9]). 

Corollary 1. The MLE p{i) satisfies 



p{i) 



rises 

where S is the set of dividers ofH and p{ic) depends only on the marginal table x{ic) 
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The rest of this subsection is devoted to a proof of Theorem [TJ Before we give the 
proof, we present some lemmas required to prove the theorem. 

Lemma 1. If S is a partial edge separator of L, S is also a partial edge separator ofTi. 

Proof. Since S is saturated in L, S is an extended compact component. Hence S" is a 
partial edge of Ti. Denote by 7i([m] \ S) the subhypergraph of Ti induced by [m] \ S. 
Assume that 5* is not a separator of 7i. Then 7i([m] \ S) is connected. 

Since 5* is a separator of L, there exists a decomposition (A, i?, S) of L by definition. 
Define n{A) and H^B) by 

n^A) -={0 en\Anc n{B) ■= {c en\ Bnc y^^}. 

Then we have 7Y(A) fl 'H{B) = which contradicts the fact that 7Y([m] \ S) is connected. 

□ 



m 



By using Lemma [H we can proy e the following lemma in the same way as Theorem 5 



Malvestuto and Moscarini 200d . 



Lemma 2. 7i is acyclic. 

Denote by S the set of dividers of H. 

Lemma 3. Suppose S & S is a divider of 7i with a decomposition {A,B,S). Then S is 
a partial edge separator of L with a decomposition {A,B,S). 

Proof. Since S* is a divider, there exists a pair of vertices {u, v} such that S is the unique 
minimal partial edge separating u and v. Then there exists a decomposition [A, B, S) 
such that u E A and v G B. Any vertices in A and any vertices in B are not tightly 
connected in L. This implies that there exists a partial edge separator S' G S oi L and 
a decomposition [A', B', S') of L satisfying A' D A and B' D B. From Lemma [H S' is 
also a partial edge separator of TC. Noting that S is the unique minimal partial edge of 
Ti. separating u and v, we have S' = S. Then {A, B, S) is a decomposition of L. □ 

Now we provide a proof of Theorem [TJ 

Proof of TheoremUl It is obvious that L C L^. From Lemma [3l every divider S G S 
of 7i is a partial edge separator of L and hence saturated in L. From Lemma [21 is 
considered as the set of maximal cliques of a chordal graph Q^. Let C k, k = 1, . . . ,K, b e 
a perfect sequence of maximal cliques in (see e.g. Section 2.1.3 of iLauritzenl |l996| l 
Let 

-Bfe := Ci U C2 U ■ ■ ■ U Cfc, Rk := {Ck U Ck~i U ■ ■ ■ U Ck) \ Sk, Sk ■= -Bfc-i H Ck- 

It is known that Sk is a divider of TC with a decomposition {Bk-i, Rk, Sk)- From Lemma 
[31 Sk is a partial edge separator in L with the same decomposition. Hence L is conformal 
to {Lbj,_,,Lcj,}, i.e. 

L = {LnLB,_,) + {LnLc,). 
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In the same way Sk-i is a partial edge separator in L with a decomposition {Bk-2, Rk-i, Sk-i) 
and hence L is conformal to {Lbj^_2, Lcj^uCk-i}^ i-^- 

L = (L n + (l n Lc^uck-J 

+ [((^ n Lb,_J + {Ln Lc,)) n Lc,_ 
= {Ln Lb,_,) + {Ln Lc,_,) + {Ln Lc,). 

By iterating this procedure, we can obtain L = (L fl L^J + ■ ■ ■ + (L fl Lcj^). Hence L is 
conformal to {Lc C G 7Y}. Therefore L is an HSM of L-}^. 

Suppose that there exists a smaller decomposable model L-j-i' C for which L is an 
HSM. Then there exist C ETi and a divider S" of Ti' such that S" C C . This contradicts 
the fact that any vertices in C are tightly connected in L. □ 

2.3 Hierarchical models containing a log-afRne model 

In Theorem [1] we have shown the existence of the smallest decomposable model containing 
a log-affine model. Then a natural question is to ask whether there exists the smallest 
hierarchical model containing a log-affine model. In general this does not hold and we 
here discuss properties of hierarchical models containing a log-affine model. 

As an example consider the model ([H])- Although (jS]) is a submodel of the conditional 
independence model i _LL /c | j, ([6]) is not an HSM of the conditional independence model. 
The difficulty lies in the fact that a hierarchical model containing L may have a partial 
edge separator which is not a partial edge separator of L. 

Given a log-affine model L, consider the set of hierarchical models La containing L: 
{La I La D L}. Because of the relation La fl La' = Lao A' it follows that there exists the 
smallest hierarchical model in {La | La D L}. We call the smallest hierarchical model 
containing L as hierarchical closure of L and denote the corresponding simplicial complex 
and the hierarchical model by A(L) and La(l), respectively. Note that for both ([2]) and 
([6]), the hierarchical closure is the three-way conditional independence model ([1]). 

We call a log-affine model L a tight hierarchical subspace model if L is an HSM of 
-^A(L)- If L is a tight HSM, obviously A(L) is the smallest simplicial complex such that 
L is its HSM. 

We now present an example of a log-affine model L of a 5-way contingency table, 
which has two minimal hierarchical models Ai, A2, such that L is an HSM of both Lai 
and La2- Consider the following model L of 5-way contingency tables: 

5 

log pill, ... ,i^) = ^a{j}{ij) + 6{ip{i^2}iiui2) + V^{i,3}(^i, ^s) + V^{2,3}(^2, ^3) 
i=i 

+ V'{2,4}(^2,«4) + V^{3,5}(^3,^5) + ^'{4,5} (^4, ^s)) , 

where the main effects a{j}'s and 6 are parameters and V'OJ'l's are fixed functions. The 
hierarchical closure of this model is given by 

red A = {{1, 2}, {1, 3}, {2, 3}, {2, 4}, {3, 5}, {4, 5}} 
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which has a divider {2, 3}. Hence L is not tight. Note that L is an HSM of any La, such 
that La does not possess a partial edge separator and L C La- As in Figure [1] define 

red Ai = red A U {{1,4}}, red As = red A U {{1,5}}. 

Then L is an HSM of both La^ and La2- 




Figure 1: Two ways to cross a divider of the hierarchical closure 



3 Split model as a hierarchical subspace model 



We consider the split model by H0jsgaardl j2003 |. An example of the split model for three- 



way tables is L = L'^^i^^ + -^{3}^ + -^{\ 3}) which represents that there exists a conditional 
interaction of ii and 23 given 12 only if 22 = 2. A precise definitio n is given below. 



We first define the context specific interaction (CSI) model (jH0jsgaardl |2004j ). The 
split model is a particular case of the CSI model. Recall that V = M'-^' is the set of all 
tables. For any subset B of [m] and js G Xb, we consider a subspace L^^ of V in that 
only the js-slice has nonzero components, that is, 

L3B = {^eV\iPii)=OiiiBy^jB}- 

= {ip eV \ = f{i[m]\B)l{iB=jB}, f ■■ I[m]\B ^ • 

If B is empty, we define L^^ = V with a dummy symbol j^. For any subsets B and D of 
[m] and any level Jb G Xb, we define a subspace 

4" = LnusnL^^ = {i:eV\i:{i) = f{in\B)l{iB=M, f : Id\b ^ ^} ■ 

The subspace L^ represents a context specific interaction, that is, an interaction over i^) 
exists only ii Ib = Jb- The following relation is easily proved: 



WVJB 



3b&Ib 



A context specific interaction ( CSI) model is a direct sum of subspaces L^ for a set of 
[jB,D)'s. It is easily shown that any hierarchical model is a CSI model. 
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Next we define split models. In order to clarify the definition, we consider a more 
general model, the split subs pace model . The split model is a particular case of the split 
subspace models. Although lH0jsgaardl |2003| defined the split model on the basis of a 



graphical model, we let the graphical model be a decomposable model for simplicity. 

Consider a decomposable model La with the set of maximal cliques C. For each 
C E C choose a subset Z{C) C C. We admit the case where Z{C) is empty. For each 
jz(c) ^ ^z{c) ! choose a subspace N^^'^^^ C L^*^' such that 

VC" e C \ {C}, 4nc' c ^c'"'' c 4'"'" • (11) 
Then a log-affine model L is defined by 

L = Y.Nc, Nc = ^c'"'- (12) 

We call L a split subspace model with root C if it satisfies ffTTj) and f|T2|) . The following 
proposition holds. 

Proposition 1. Let La be a decomposable model with the cliques C. Then any split 
subspace model L with root C is an HSM of La • 

Proof. We first check L C J2cec ^c- Since L = Xlcec ^c*' '^^ sufficient to show Nq C Lq 
for each C G C. However, this is clear because N^f'''^^ C L^^^'^^ C Lc for any jz{c)- Next 
we prove that Ls C L for any divider S. From the definition of dividers of decomposable 
models, there exist two cliques C and C {C ^ C) such that S = C (iC. By the relations 
(fTOj) and ([n]), we have 

Ls C L(c"nc)uz(C) = ^ -^c^nc N^f^^^ = Nc- 

Therefore Ls C L. Lastly, we prove that L is conformal to {Lc | C G C}. We have 
already proved A''^; C Lc- Since Nc is also a subspace of L, we obtain Nc C L fl L^ and 
therefore L = ^^(^gc C J2cec(^ ^ ^c)- The opposite inclusion is obvious. □ 

Now we define a split model as a special case of split subspace models. We say that 
any decomposable model is a split model of degree zero. Then a split model of degree one 
is defined as the decomposition (fT2l) with 

3z(C) 
D 



where C'J'"'^^ is a decomposable model with the vertex set C \ Z{C). Here we assume 

VC" G C \ {C}, 3D G Cj*''' s.t. (C n C") \ Z{C) C (13) 
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to assure t he condition ffTTl) . Split models of degree greater than one are defined recur- 



sively. See lH0jsgaardl |2003| for details. 

In Section [5l we will consider an example of the split model (of degree one). The 
following elementary lemma is useful to obtain the MLE of split models. 

Lemma 4. Let I = IJa*^-** ^ partition ofX and consider subspaces Nx C V such that 

NxC{ijeV\ ^pii) = if i ^ Jx}. 

Then the MLE of the model J2x^>^ ^■^ given by p{i) = J2\i^>y/^)P^i''')^{i€j),} > where px{i) 
is the MLE of the model Nx with the total frequency nx = 'Yli^Xx ^(*)- 

4 Conditional tests of hierarchical subspace models 
via Markov bases 

In this section we discuss conditional tests of our proposed model via Markov bases 
technique. In Section 11.11 we have discussed that the divide-and-conquer approach of 



Dobra and SuUivantl |2004l | still works for the model ([2]). In this section we generalize the 
argument to an HSM L. 

Let X = {x{i)}i(zx denote an m-way contingency table, where x{i) denotes the fre- 
quency of the cell i El. Let h be the set of sufficient statistics for L. We assume that the 
elements of h are integer combinations of the frequencies x{i). For a hierarchical model 
La, h is written by 

b = {xD{iD),iD elD,D e red A}, 

where Xoiio) = Xli ^ex c '''(*^' consider 6 as a column vector with dimension 

u. We order the elements of x appropriately and consider a; as a column vector. Then the 
relation between the joint frequencies x and the marginal frequencies b is written simply 
as 

b = Ax, 

where A is a. u x \I\ integer matrix. A is called the configuration for L. 

For a subset D C [m], denote L{D) := L fl Ld. Let (741,^42, 5") be a decomposition 
of L and define Vi := Ai U S and V2 := A2 U S. Since L is conformal to {Ly-^, Lyj}, we 
note that L{Vi) and L{V2) are marginal models corresponding to Vi and V2, respectively. 
Denote by Ay^ = {ciVi{'i'Vi)}iv^eivj^ and Ay^ = {o-V2{'''V2)}iv2^iv2 configurations for the 
marginal models LiVi) and L(y2), where avi(*Vi) and <2y2(*V2) denote column vectors of 
Avi and Av2, respectively. Noting that iy^ = {lA^is) and = («s'M2); the configuration 
A for L is written by 

A = Avi ®s M2 = {«yi(Mi*5) © ay2(*5M2)}MiGJ.4i,ise2:s,M2G2:A2' 

where 

av,{iAjs) ® av^iisiA^) = ( """'S-Y^l 

V "y2l*5*A2j 
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Given b, the set 

J^b = {x>0\b = Ax} 

of contingency tables sharing the same b is called a fiber. An integer array z = {z{i)}i(zx 
of the same dimension as x is called a move if Az = 0. As in ([8]), we denote z with degree 
d as 

where ii, . . . , G X are cells (with replication) of positive elements of z and i'^, . . . , G X 
are cells of negative elements of z. Moves are used for steps of Markov chain Monte Carlo 
simulation within each fiber. If we add or subtract a move z to a; G JF^, then x i: z & J-'h 
and we can move from x to another state x + z (or x — z) in the same fiber jFf,, as long 
as there is no negative element in x + z (or x — z). A finite set Ai of moves is called a 
Markov basis if for every fiber the states become mutually accessible by the moves from 
M. 

Assume that BiVi) and B{V2) are Markov bases for LiVi) and L(y2), respectively. Let 
zi = {ziiivi)}iv^eJv^ e B{Vi) and = {z2{iv2)} iv^&Xv^ ^ -^(^2)- Since S is saturated, we 
have 



5^ zi^iv^] 
Hence Zi and Z2 can be written as 



0, 



0. 



^1 = Jl), • • • , (*d, jd)}ll{(*'l, jl), • • • , (*d, jd)}], 

Z2 = [{(jl, fcl), . . . , (jd, fcd)}||{(jl, fci), . . . , (j,, K)}], 

respectively, where ik,i'k ^ Xii, Jfc G Xg and kk, k'^ G X^a for /c = 1, . . . . 



(14) 



Definition 3 (jPobra and SulhvantI |2004| ). Define Zi G BiVi) as in ([I^. Lei k :-- 
{fci, . . . , ferf} G Xa2 X ■ ■ ■ X X^2 . De/ine zf 



[{(*i, 



(*d,jd,fcd)}||{(*'i, 



Then we define Ext{B{Vi) — > L) by 

Ext{B{Vi) L) := {zf I fc G X, 



A2 



X ^Tas}- 



In the same way as Lemma 5.4 in iDobra and SulhvantI l2004l we can obtain the fol- 
lowing lemma. 



Lemma 5. Suppose that Z\ G Biy^) as in ( fT^l j. Then Ext(i3(Vi) 
/or L. 



V) is the set of moves 



Proof. Let z G Ext(i3(Vi) L). Then we have 

Az = 



14 



where 

Since zv^{ivi) = -2i(*yi) and 21 e Xliv^eXv-i "Vi(*Vi)-2^yi(*yi) = 0. From Definition 

[3l (iy^) = for all iy^ G Xy^. Hence Az = 0. □ 

Theorem 2. Let B{Vi) and B{V2) be Markov bases for L{Vi) andL{V2), respectively. Let 
^Vi,V2 Markov basis for the hierarchical model with two cliques V\ and V2. Then 

E := Ext(i3(\/i) ^ L) U Ext(i3(l^2) ^ ^) U By^y^. (15) 

zs a Markov basis for L . 



We can prove the theorem in the same way as Theorem 5.6 in iDobra and SuUivant 



200J]. Suppose that L is an HSM of L-^. Then Theorem |2] implies that a Markov 
basis for L is obtained from S(C), C G 7Y, by recursively using ( fT5|) . This shows that 
the computation of a Markov basis can be localized according to the maximal extended 
compact components of L. 

Concerning Markov bases of the split model of Section [3] we state the following lemma. 

Lemma 6. With the same notation as in Lemma^ a Markov basis of the model XIa^^ 
is given by union of Markov bases of Nx- 



5 Examples 

In this section we give several applications of HSMs. In Section 15.11 we analyze the data 
on song sequence of a wood pewee, which we already discussed in Section II. 1[ In Section 
15.21 we consider an example of a split model. 

5.1 Sequences of unrepeated events 

Consider the data on song sequence of a wood pewee in Table [1] As mentioned in Section 
II. H it is a 4 X 4 X 4 contingency table with the cells of the form {i,i,k) and {i,j,j) 
being structural zeros. The probability function {pijk} satisfies the condition puk = and 
Pijj = 0, or equivalently, logpuk = —00 and logpijj = —00. Hence {logpijk} is not an 
element of ^ = M^^"^^^. However we can replace V by i?'"^', where 

J = J\ G [4]}U{{i,j,j),i,j G [4]}), 

and consider log-affine models of i?'-^'. Formally it is more convenient to proceed with 
V = ]^4x4x4 aiiQ-^ing logpjjfc = ^og Pijj = —00. 

We first consider the conditional independence model 

-^Modcll = -^{1,2} + -^{2,3}) 
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which corresponds to ([7]). The MLE of this model is exphcitly given by 



Pijk 



A Markov basis of the model is ;BModeii = 'B{i,2},{2,3} (see Theorem |2]for the notation). An 
experimental result that compares the saturated model and Model 1 is given in Figure [2l 
Both the asymptotic and experimental estimates of the p-value are almost zero. 

Although Model 1 does not fit the data, we proceed to consider a submodel of Model 1 
for theoretical interest. Let 

-^^model2 = {Oli + Pj + 'Jk + 4>d{i=j} + '4^j'^{j=k}} ■ 

This model is an HSM of i^{i,2} + -^{2,3}- It represents a quasi-independence model for the 
three-way table. The MLE of the model is 

.(1) .(2) 
. _ PijPjk 
Pijk 



^+3 + / 



n 



where p-j"* and p^^^ are the MLE of the 2- way quasi-independence models with the diagonal 
structural zeros, that is, 

Pu^ = e'^'e^n{i^j}, p:^} = Xi++/n, = x+j+/n, 

Pfj = e'^'^^'lbVfc}' Pf+ = P+k = x++k/n. 

They are computed by the iterative proportional fitting method. By Theorem [21 a Markov 
basis is given by 

i3Modci2 = i3{i,2},{2,3} U Ext(i3({l, 2}) ^V)U Ext(i3({2, 3}) ^ V) 

where i3({l,2}) and B{{2,3}) are the Markov bases of the 2 -way quasi-independence 
model with structural zeros obtained by lAoki and Takemural 2005| . An experimental 
result that compares the Model 1 and Model 2 is given in Figure [2j 



5.2 WAM data 



Here w e deal with a real data called women and mathematics (wam) data used in lH0jsgaard 



2003 



The data is shown in Table [2j The data consists of the following six factors: (1) At- 
tendance in math lectures (attended=l, not=2), (2) Sex (female=l, male=2), (3) School 
type (suburban=l, urban=2), (4) Agree in statement "I'll need mathematics in my future 
work" (agree=l, disagree=2), (5) Subject preference ('math-scie nce=l, libe r al art s=2) and 
(6) Future plans (college=l, job=2). We consider two models lH0jsgaardl |2003l | treated. 
The first model is a decomposable model 



^Modell 



"{1,2,3,5} 



+ L 



{2,3,4,5} 



+ L 



{3,4,5,6}- 
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(a) Deviance of Model 1 (G^ = 142.4). (b) Deviance of Model 2 from Model 1 (G^ = 66.9). 



Figure 2: The empirical distribution and asymptotic distribution of deviance G^ for the 
wood pewee data. The degree of freedom is 16 and 10, respectively. The number of steps 
in the MCMC procedure is 10^. 

By Theorem [2], a Markov basis of this model is given by 

^Modell = ^{1,2,3,5}, {2,3,4,5,6} U i3{i^2,3,4,5}, {3,4,5,6} • 

The second model is a split model 

-^^Modcl2 = -^^{1,2,3,5} + L^{2,5} + "^{4,5} + "^{2,4^5} + "^{3,4,5,6}- 

This model is indeed a split model (of degree one) with 

C = {{1,2,3,5}, {2,3,4,5}, {3,4,5,6}}, 
Z({1,2,3,5}) = 0, Cj;%,3,5} = {{1,2,3,5}}, 

Z({2,3,4,5}) = {3}, Cg, ,^ = {{2, 5}, {4, 5}}, Cfg, ,^ = {{2,4,5}}, 
Z({3,4,5,6}) = 0, Cg,5gj = {{3,4,5,6}}. 

The condition (fT3l) is easily checked. The MLE is calculated if one decomposes the table 
into those for J3 = 1 and J3 = 2 and then calculates the MLE separately (Lemma Hj). By 
Theorem [2] and Lemma O a Markov basis of this model is 

SModel2 = ^{1,2^5}, {4,5,6} ^ ^{1,2,3,5}, {2,3,4,5,6} U i3{i^2,3,4,5}, {3,4,5,6} ; 

where we put B'^\l^^^^y^^^^^^^y = i3{i,2,5},{4,5,6} n L'^=^. 

We calculate the p-value of the deviance of Model 2 from Model 1 by the MCMC 
method. The number of steps in the MCMC procedure is 10^. The result is as follows. 
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Table 2: Survey data concerning the attitudes of high-school students in New Jersey 
towards mathematics. 



SrViool 




Suburban school 




Urban school 






Sgx 












H pvn 51 1 P 




Male 






p-ppfp-pp-n pp 


At fpnrl 


Not 


At fpnrl 


Not 


At fpnrl 


Not 


Attend 


Not 




A/lpi f M-tipipn ppt: 




















Agree 


37 


27 


51 


48 


51 


55 


109 


86 




Disagree 


16 


11 


10 


19 


24 


28 


21 


25 




Liberal arts 




















Agree 


16 


15 


7 


6 


32 


34 


30 


31 




Disagree 


12 


24 


13 


7 


55 


39 


26 


19 


Job 


Math-sciences 
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10 


8 


12 


15 


2 


1 
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5 
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9 
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8 


9 


8 


9 


4 


5 




Liberal arts 
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7 


10 


7 


3 


5 


2 


1 
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Disagree 


8 


4 


6 


4 


10 


9 


3 


6 



Source: Fowlkes et al. fl988l 




5 10 15 20 

deviance 



Figure 3: The empirical and asymptotic distributions of the deviance of Model 2 from 
Model 1. 



Deviance df p-value (asymptotic) p-value (MCMC) 
1.851 2 0.396 0.399±0.012 

The confidence interval of the p-value is computed on the basis of the batch-means method. 
The empirical distribution and asymptotic distribution of the deviance are given in Fig- 
ure [31 Since the sample size of the data is large, the results of the asymptotic method 
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and MCMC method are almost the same. 



6 Concluding remarks 



We proposed a hierarchical subspace model, by defining the notion of conformality of linear 
subspaces to a given hierarchical model. The notion of an HSM gives a modeling strategy 
of multiway tables and unifies various models of interaction effects in the literature. We 
illustrated practical advantage of our modeling strategy with some data sets. 

In this paper we only considered log-affine model. Note that there are some nonlinear 
models of interaction terms for two-way tables, such as the RC association model. It seems 
clear that we can separately fit a nonlinear model to each maximal compact component of a 
hierarchical model, as long as the models for dividers are saturated. However conformality 
of a general nonlinear model with respect to a given hierarchical model has to be carefully 
defined and this is left to our future study. 

The separation by divider s are closely related to the notion of coUapsibility (e.g. 
Asmussen and Edwards 1983| ) of hierarchical models. Localization of statistical infer- 
ence to the marginal table of a maximal compact component seems to correspond to the 
coUapsibility to th e component. Also our results f or Markov bases for HSMs are closely 



related to those of ISullivantI |2007l |. ISuUivantl |2007l | is more concerned with Markov bases 
for models with latent variables and marginalization of latent variables. CoUapsibility 
and marginalization properties of HSM require further investigation. 

In the computation of the MLE for the hierarchical models, it is known that the algo- 
rithm can be localized into the marginal tables of maximal cliques for chordal extension 
of the simplicial comp lex associated with the model, w hich is smaller than maximal com- 



pact component (e.g. iBadsberg and Malvestutd 2001| ). By using the notion of ambient 



hierarchical model discussed in Section 12. 3[ it may be possible to localize the inference to 
smaller units than maximal extended compact component also in the HSMs. 

Another important question on hierarchical subspace model is the necessity of sat- 
uration of the model for dividers. Saturation of the model for dividers is a sufficient 
condition for localization of statistical inference, but it may not be a necessary condition. 
There may exist some important models, for which statistical inferences can be localized 
to extended compact components without the requirement of saturation of dividers. This 
question also needs a careful investigation. 
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